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Abstract —A non-parametric technique to identify weak 
sources within dense sensor arrays is developed using a network 
approach. No knowledge about the propagation medium is 
needed except that signal strengths decay to insignificant levels 
within a scale that is shorter than the aperture. We then 
reinterpret the spatial covariance matrix of a wave field as a 
matrix whose support is a connectivity matrix of a network of ver¬ 
tices (sensors) connected into communities. These communities 
correspond to sensor clusters associated with individual sources. 
We estimate the support of the covariance matrix from limited¬ 
time data using a robust hypothesis test combined with a physical 
distance criterion. The latter ensures sufficient network sparsity 
to prevent vertex communities from forming by chance. We 
verify the approach on simulated data and quantify its reliability. 
The method is then applied to data from a dense 5200 element 
geophone array that blanketed 7 x 10 km of the city of Long 
Beach (CA). The analysis exposes a helicopter traversing the 
array, oil production facilities, and reveals that low-frequency 
events tend to occur near roads. 

Index Terms —Array processing, network analysis, coherence. 
I. Introduction 

L arge aperture sensor arrays with dense spatial sampling 
are becoming more common as the cost for sensor and 
communications hardware decreases. Examples of such arrays 
are the USArray initiative in seismology with 500 stations 
covering large parts of the continental US Q or dense seismic 
arrays for seismic exploration with 5200 sensors as presented 
here. 

This work addresses the task of localizing weak sources 
within a fraction of a dense array. Such sources naturally 
become more common as the array aperture increases. The 
task is solved by considering a sensor array as a network 
of vertices (the sensors) where the relation between vertices 
is dehned by high coherence and spatial proximity of the 
underlying sensors. Eor the case where sources within the 
array are not too close the resulting network has several 
disconnected components, each corresponding to a spatially 
contiguous sensor cluster sensing one of the sources. 

Source localization with sensor arrays in complex environ¬ 
ments has been addressed using the model-based framework 
of matched held processing (MEP). The method originates 
in ocean acoustics 0-0 but has found applications in 
seismology 0,0 and electromagnetics 0,0. The structure 
of the covariance matrix plays an in important role in these 
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approaches, in particular for data-adaptive implementations 
using, e.g., MVDR | |T0| or MUSIC m- But MEP methods 
rely on knowledge about the propagation medium in some 
way. If such information is not sufficiently available their 
applicability may be severely limited. 

Our approach assumes limited knowledge about the prop¬ 
agation medium. The only assumption made is that weak 
signals enter the noise Hoor within a problem-dependent 
distance d that is smaller than the array aperture. That re¬ 
quirement is realistic for large arrays in moderately attenuating 
media such as the earth. If the sources are not too close 
we will see that the covariance matrix has a structure that 
suggests a reinterpretation: the matrix of spatial covariance 
of a wave held at different sensor locations is turned into 
the connectivity matrix that describes the organization of a 
network of sensors. Prom that vantage point hnding weak 
within-aperture sources becomes identifying connected sensor 
communities. The general applicability of the method comes at 
the cost of precision as a source is only localized through the 
identihcation of the sensors on which it has had a signihcant 
impact. By dehnition this set of sensors typically corresponds 
to a small fraction of sensors in a dense array. 

We are not aware of any method that solves this source 
localization problem. It is conceivable that methods based 
on singular value decomposition of the coherence matrix or 
the correlation of the evolution of covariance matrix elements 
could also be used. These approaches are not developed here. 
Our network framework uses the covariance matrix structure 
to identify subspaces within that matrix that represent local 
sources. 


In section we dehne the problem setting and explain 
how the array covariance matrix has a structure that allows 
it to be reinterpreted as the connectivity matrix of a network. 
In section [111] we describe why we use coherence instead 
of the covariance and how the distance d can be estimated. 
Section IV hrst introduces some network terminology and 
then describes how an appropriate network can be constructed 
from the coherence matrix. We verify and test the reliability 
of the technique on simulated data in section |V] This is 
followed in section |Vl] by an application to real data from 
a 5200 sensor geophone array that was deployed in the city 
of Long Beach (CA). 
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II. Covariance Matrix Support defines a Network 

Consider a large aperture array with N sensors distributed 
densely over spatial locations {ri}j=i , jy. We assume that 
there are weak sources within the aperture that produce signals 
that can propagate through space. The channel between any 
such source location p and sensor location is characterized 
by a Green’s function, g{ri,p). Let the vector g(p) = 
[g{ri, p),..., g{r]s[, p)]'^ G be the frequency domain 
response of the array to a source at location p (equations in 
this article apply to the frequency domain). Consider then pi 
to span a grid of M possible source locations 
with each such possible location having an associated response 
g(P*) = Si- We define the matrix G = [gi,...,gM] 
containing the array response to a source all of those locations. 
If we let the source signals Si at those possible locations pi be 
collected in the source vector s = ..., smY ^ then 

the measured signal at the N array sensors is modelled as: 


X = G s + n , 


( 1 ) 


where n = [rii,..., n^v] S C'^ is a multivariate i.i.d. noise 
process. A common approach to estimate the source vector 
s and hence localize the sources is matched-field process¬ 
ing (MFP) 0. The method has been applied to acoustic Q, 
1^, seismic 0. 0. and electromagnetic Q, Q wave fields. 

An important requirement for MFP to work is the knowl¬ 
edge of the propagation medium encapsulated in p(r, p). Con¬ 
sider now a grid-free formulation for the observation vector 
assuming a source distribution function s(p) with sources at 
locations pi,..., pK- 


x = / g{p)s{p)dp + n 


= / g(p) 


K 


'^SiSip- Pi) 




dp + n 


K 


= X! Si + n . 


The covariance matrix in this scenario becomes: 
c = (xx") = {Y. ^ ’ 

= II {s^s*)s^sf + (nn^) 

K 

= ^ |sip g,gf -f D„ , 


( 2 ) 

(3) 

(4) 

(5) 


( 6 ) 


where we exploit the mutual independence between the 
source and noise processes (sin*) = Q,(nin*) = {siS*) = 
5ij: and D„ is a diagonal matrix whose diagonal element 

Da is the noise variance of sensor i. 

We now assume that information about g{r,p) is severely 
limited: we only know that the source signal at location pi 
has no significant effect at a sensor beyond a distance d, i.e. 
the fc-th element g^^^^ = 0 if |pi — rfc| > d. This is equivalent 
to saying that the support of gi indicates the sensors affected 
by the source at location p^. This means that if all sources are 


mutually separated by at least 2d, \pi — Pj\ > 2d Vi ^ j, then 
the corresponding support sets of the gi do not overlap. 

Let us define the support indicator function X(v) of a vector 
or matrix. The lack of overlap of the g^ and the support- 
indicator function properties (Appendix allow us to write 
the support of the sum in ^ as 

^ = H2:(gigf) (7) 

K 

= ^I(g,)T(gz)^ (8) 

i-l 

This support of C has a very useful structure: In appendix [B] 
we show that a network with N vertices and K connected 
components has a connectivity matrix that has a support that 
is equal or less than that of ([^. Consequently, the connected 
components of a network with connectivity matrix 1(G) will 
lead us to the support of the g^ and hence to the sensor 
clusters that sensed the K sources (the addition of the diagonal 
term does not affect this fact). In sections and we 
describe how an array network that retains this property can 
be constructed based on a limited number of snapshots. 

III. Coherence and its Spatial Decay 

In this section we develop a robust hypothesis test to estab¬ 
lish the support of the covariance matrix. Using a simulation 
we then look into how coherence values decay as a function 
of receiver-parr separation. This guides us to a value for d. 

A. Coherence-based hypothesis test 

Consider a zero-mean stationary signal u(r) sampled at 
intervals At. We define its discrete Fourier transform over a 
period Tw=NAt starting at time t as: 

N-l 

Xk{f, i) = H Uk{t + jAt)wj , (9) 

j=o 

where frequency is discretized as / = k = 0,..., N/2 
and the weights w{j) are used to control spectral leakage ( 
m). A sample estimate for the covariance in 0 can be 
computed using M time snapshots of Xk{f,t) (frequency 
dependence is implicit): 

M-l 

= mYI > ( 10 ) 

t=0 

This sample estimate will be subject to the vagaries of the 
noise processes at the receivers r^. A customary attempt 
to reduce the impact of this location-specific variations is to 
compute the coherence instead: 

M Xi{t)Xj{t) 

For our purposes the coherence is equally useful as the 
covariance because our interest lies in the support of the 
covariance matrix which, in the limit M —> oo, is identical 
to that of the coherence. 






3 


A hypothesis test can be used to decide if an element 
Cfj is sufficiently different from the the probability density 
function (PDF) it would have in the absence of a shared 
signal. Unfortunately, despite the inherent normalization the 
PDF of the sample coherence is not invariant to changes of 
the noise variance over time in rii and rij, a very typical type 
of non-stationarity encountered in real data. We demonstrate 
this for the case of M=19 snapshots of independent noise 
signals 71^(1),..., ni{M) and nj{l),..., nj{M) with variance 
being a function of snapshot I, i.e. n{l) ^ CAf{Q, cr'^{l))- Three 
situations are considered: 

Stationary: af{l) = = 1 for all I (both signals 

are stationary). Power step (overlap): Cr2(1...5)=cr2(i _ 5)^ 
20 and erf(6...19 )=(t|( 6...19)=! (both signals have high 
variance on initial five snapshots and then step down in 
power). Power step (no overlap): af(1...5)=20,af(6...19)=l 
and (t|( 6...19)=20,erf(1...5)=1 (first signal starts out with 
high variance and steps down while second signal starts with 
low variance and steps up). 

Figure iK shows the simulated PDF of (7®^ for the three 
scenarios (based on 10® realizations). The pdf of the sample 
coherence 0 substantially deviates from the stationary case 
for the two non-stationary scenarios considered. Such non- 
stationarities are, however, common in seismo-acoustic noise 
processes and cannot be assumed known. A reliable hypothesis 
test can therefore not be based on the PDF of the sample co¬ 
herence (111. We therefore consider the phase-only coherence 
matrix (PCM), a robust definition of the coherence HD: 

\xM ■ 


Cij = — V 
^ M ^ 


The PCM in ([Tg ignores magnitudes and only relies on 
phase information. Figure shows the PDF of Cij for 
the same scenarios as before (blue lines) and demonstrates 
how the distribution of this statistic is invariant for the types 
of non-stationarities. We use the coherence •HI to test this 
hypothesis: ’’the signals observed at locations and Vj based 
on M = 19 snapshots are uncorrelated”. The hypothesis is 
accepted if \Cij\ < Ca and rejected otherwise. The threshold 
coherence magnitude Cq is set such that the probability of 
falsely rejecting the hypothesis is a, formally: 


c„ = cdr\l-a), (13) 

where cdf~^(-) is the inverse of the cumulative distribution 
function estimated from the simulation in Figure To 
provide an idea about the likelihood of falsely accepting the 
null-hypothesis Figure EP also shows the simulated pdf of 
\Cij I for the case where there is a signal present with SNR=3. 
Table gives the decision threshold Ca and false acceptance 
rate /3 given the false rejection rate a. For the remainder of 
this article we use coherence as a short-hand for the magnitude 
of the phase-only coherence ([T^. 


B. Spatial decay of coherence 

The null-hypothesis rejection rate should be higher than 
a for a sensor pair in the proximity of a source. If fact, 
for sources with a local imprint this excess rejection rate 


TABLE I 

Coherence magnitude decision threshold Cq based on 

SIMULATION FOR CONFIDENCE LEVELS a (I.E. FALSE REJECTION 
PROBABILITY). THE FALSE ACCEPTANCE PROBABILITY (3 IS GIVEN FOR 
THE CASE SNR=3. 


a 

^a. 

/3 (SNR = 3) 

0.010 

0.484 

0.0768 

0.005 

0.517 

0.118 

0.001 

0.582 

0.247 




Fig. 1. (A) The PDF of the sample coherence El} for independent noise- 

processes rii and Uj wliose statistics vary as a function of snapshot index 
according to three scenarios, two of which describe a non-stationary process 
(see the text for details). (B) The PDF of sample coherence ED for the 
three scenarios is stable (blue lines overlap). We use this PDF to test the 
hypothesis that two signals share no common component. At the decision 
threshold =0.484 the hypothesis is falsely rejected with probability a=0.01. 
The PDF of \Cij\ for the case of a SNR = 3 signal being present in the 
noise is also shown (orange). The false acceptance rate is /3 = 0.0768. 


depends on receiver pair separation. The following simulation 
demonstrates this. Consider a square array on a homogeneous 
half-space with aperture 2.8 x 2.8 km and sensors placed on a 
rectangular grid spaced 90 m, i.e. K = 1024 sensors at posi¬ 
tions ri,... (Figure]^. We simulate three source signals 
si(t), S 2 {t), S 3 (t) at locations pi,p 2 , Ps (blue crosses) which 
spread spherically across the array at velocity u=340 ms“^, 
generating time-domain sensor data at location r^: 

It 

where Si(T)~A/’(0,cr^), n(r)^A/’(0,cr^), si_Ls 2 -Ln are mutu¬ 
ally uncorrelated. An arrival time perturbation e ^ crl) 
with a^=3Q ms is introduced to emulate the situation where 
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X [m] 


Fig. 2. Simulation layout of a 2.8x2.8 km array with 1024 sensor spaced 
90 m apart (dots) with three sources (blue cross, see text for details). A set 
of 19 snapshots was simulated with SNR = 200 at 10 m distance from 
each source. The colored triangles are sensor groups representing connected 
subgraphs from the simulation. The colored enclosures show the convex hull 
of each sensor clusters. 


precise knowledge of the propagation channel is lacking, e.g. 
due to meteorologic convections in acoustics or unknown het¬ 
erogeneities in seismic data. The simulation assumes cr^/cr^= 
SNR=200 at 10 m from the source and sampling period 
of Af=4 ms. Fourier coefficients Xk{f,t) are computed 
according to (j^ with =2561.02 s at 20.51 Hz. 

A sample coherence matrix Cij{f) ( [T2| is calculated from 
M=19 snapshots of Xk The receiver-pairs are grouped 
into 50-m distance bins from 50-1000 m with each bin 
containing between 3.5T0^ to 3OT0^ sensor pairs. In each 
group we compute the fraction of pairs that reject the null 
hypothesis (\Cij \ > Ca) divided by the total number of pairs; 


Hhj\ ICtjl > Cg} 
Hhj} 


(15) 


where # refers to the cardinality of the set. We repeat the 
process for 1710 simulation runs, giving us 1710 fractions for 
each distance bin. Figure[^gives the 10-90 percentile range of 
estimated fractions. The fractions decrease systematically with 
distance and for large separation are centered on the proba¬ 
bility of falsely rejecting the random wave field hypothesis, 
a = 0.01 (dashed line). For receiver pairs with large separation 
the wave field (jT^ has decayed so much that the fraction is 
not different from the expected value for a random wave field. 
For distances less than 300 m, however, the observed fraction 
is above a = 0.01. 


IV. Constructing a Network from Array Data 
In section we indicated that the structure of the support 
of the covariance matrix can be used to find coherent clusters 
using a network analysis approach. In this section we describe 
how an appropriate network can be constructed to find the 
targeted sensor clusters. We also provide an estimate on 
the false detection rate of the approach. We start out by 
introducing some terminology from network theory. 



sensor-pair distance [m] 


Fig. 3. The sensor pairs i,j of the simulated anay are grouped into 50 m 
distance bins from 50-1000 m based on dij = |ri — rj\. For each distance 
bin we calculate the fraction F of sensor pairs where the random wave 
field hypothesis is rejected (|Ciy| > c^, with a = 0.01). The shaded area 
indicates the 10-90 percentile range of these fractions from 1710 independent 
simulations. 


A. Networks, connectedness, random networks 

An undirected and unweighted network G consists of a set 
of vertices and edges {eij}j>i G {0,1} where 

Cij = 1 means that Vi and Vj are connected. The edges define 
a symmetric connectivity matrix Eij=Eji=eij. The number 
of edges leading to vertex Vi is its degree di. The mean vertex 
degree 7 of a network is the average over the vertex degrees 
of all its vertices. If 7 tends to a constant as N increases the 
network is sparse p^ . The notation |G| refers to the number 
of vertices N in network G. 

A connected component U C G is a subset of vertices 
and edges in G for which every pair of vertices n,n' G S is 
connected directly or indirectly through a sequence of edges 
in S. Finding connected components is a basic task in network 
analysis GD- Let S be the only connected component in G 
and let u = [mi, ..., un]'^ be the vertex indicator vector of U 
where Ui = 1 if i G U and 0 otherwise. It can be easily 
verified that 

(E)y = {uu^),j = u,Uj ( 16 ) 

is the connectivity matrix of that network when S is fully 
connected, i.e. all vertex pairs in U are connected. From all the 
possible edge configurations that will leave U connected this 
is the one with the most edges. Therefore the support of the 
connectivity matrix of any connected component containing 
the vertices in U will have the same or a smaller support 
than E. If we were to connect a second subset V C G with 
indicator vector v that does not share vertices with U, VnU= 
0, the resulting connectivity matrix would be simply the sum 
uu^ -f vv^ (appendix]^. It follows by induction that if there 
are several connected subsets Ui in G for which Ui n Uj = 
% yi ^ j the resulting connectivity matrix is 

E = ^u,;uf, (17) 

i 

with Ui the indicator vector of subset Ui. As before, any edge 
configuration where all subsets remain connected will have the 
same or a smaller support than that of E. 
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N 

Fig. 4. The transition probability pt where most of the vertices in a random 
network Go{N,p > pt), will be connected. 


Finally, consider the random, unweighted network Go{N,p) 
with N vertices where all pairs of vertices have the same 
probability p of being connected. The mean vertex degree in 
Go{N,p) is therefore 7 = {N — l)p because every vertex can 
connect with all TV — 1 other vertices with equal probability. 
A large fraction of vertices in a random network tend to be 
connected when the mean degree 7 exceeds one p4) , with 
about 90% being connected when 7 > 2.5. This means that for 
a connection above the transition probability pt = 2.5/{N— 1) 
most vertices will be connected. Figure]^ shows the hyperbolic 
relation in which the transition probability pt decreases as 
the network grows larger. Note that random networks are not 
sparse since their mean vertex degree depends on the network 
size N. 


B. Constructing an array network 

We saw in ([^ that the support indicator of the covariance 
matrix is a sum of K outer products of non-overlapping 
support vectors with themselves (with all diagonal elements 


equal to one). From (171 we now see that this corresponds 
exactly to the connectivity matrix of a network with N vertices 
and K fully connected components. 

The indicator vector of the T-th component would be identi¬ 
cal to the support indicator function I(gi). Finding the sensor 
clusters affected by the K sources is thus equivalent to finding 
the connected components of a network with connectivity 
matrix 1(C) (note that the covariance and coherence matrices 
share the same support). Armed with this insight we will 
now construct a coherence network with the following 
connectivity matrix: 




1 if \Cij\>Ca 
0 otherwise 


( 18 ) 


The support of the covariance is thus estimated based on 
the hypothesis test formulated in section |III-A| This straight¬ 
forward construction of an array network, however, is insuf¬ 
ficient because of the statistical fluctuations of the hypothesis 
test. In a random wave field the probability of observing 
I Gy I > Cq, is a for all receiver pairs and so in this case G° is 
in fact a random network Gq (W, a) as defined in the previous 
section. As discussed there G° will tend to be connected for 
reasonable values of a, in particular if the array has more 


than a few hundred vertices (see Figure |^. If all vertices 
are connected then any attempt to find smaller connected 
components due to a physical source is rendered futile. The 
possible remedy of decreasing a is not appropriate: the value 
cannot be made arbitrarily small because the corresponding 
hypothesis test would become overly conservative with a high 
false acceptance rate (3 (see Table |I]i. 

We now slightly modify ( fTS] ! to define the localized coher¬ 
ence network G(cq,, dmax): 

f 1 if |Gy I > c„ and | < d^ax 

— 1 r» 1 • 

I 0 Otherwise. 

We thus require not only that the coherence between any 
two sensors must be sufficient to reject the random signal 
hypothesis but also that the distance between the sensor 
locations must be less than dmax- By design the vertices of 
a connected component of G represent a collection of sensors 
that are geographically close and share a common signal 
component. 

Enforcing spatial localization of connections in the network 
limits the number of potential neighbors a vertex can connect 
to from the set of all vertices to just the set of vertices associ¬ 
ated with spatially nearby sensors. This number is essentially 
invariant to the size of the array and can be controlled via dmax- 
The network is therefore sparse for large arrays. 

Note that this does not preclude sensor clusters with a 
spatial extend of more than dmax^ as long as the true area 
of increased coherence is contiguous in space (at the scale of 
dmax) its full extend should be recoverable through indirect 
links in the connected component. To characterize the spatial 
extent of each such connected component Sj, a mean and 
covariance matrix for a two-dimensional Gaussian probability 
density function can be estimated from the sensor locations 
represented by the vertices of Sj: 

= ^ 20 ) 

r.GS, 

Y. ( 21 ) 

These values should not be confounded with actual lo¬ 
cation estimates. Source directionality, physical obstacles or 
attenuation heterogeneities in the propagation medium can 
cause clusters that are not centered around a source. Sensor 
geometry, gaps in the array, and boundaries will also cause 
a cluster to move away from its source. The sensor cluster 
does, however, provide an approximate indication about source 
location. It can also serve as a data subset selection for 
follow-up analysis with more precise array processing since 
by definition its sensors contain significant signal levels from 
a given source. 


C. Random size of connected components 

To demonstrate how the distance constraint in ([T^ prevents 
connected components to form by chance even in large arrays 
we look at the 1089 sensor layout of the simulation discussed 
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Fig. 6 . The connectivity matrix for 1024 sensor array simulation. This matrix 
is used to make the detections in Figure The markers on the non-zero 
elements are enlarged for better visibility. 


Fig. 5. Estimated probability of the size of connected components Gtq 
through reference vertex ro from 10000 random trials (A). The array network 
is constnacted using cq.oi = 0.49 and dmax = 300 m. Only six trials 
led to connected components with |Grol ^ 10. The analytical probability 
distribution of vertex degrees for a random network Go(A^, o: = 0 . 01 ) is 
shown in (B). The mean vertex degree 7 of the localized coherence network 
G{col^ dmax) is shown for comparison. 

in section [Hl-B | without signal sources. The probability distri¬ 
bution of the size of connected components is estimated for 
10000 realizations of a random wave field. First, the sample 
coherence matrix IC^ I from M = 19 random snapshots from 
all sensor vertices is computed and then the corresponding 
array network C?(cQ,=o.oi)<^max=300 m) is constructed and the 
size of the connected component containing a reference vertex 
in the center of the array, [Grol, is then stored. The simulated 
probability distribution over |Gro| is shown in Figure |^, 
which illustrates that most connected components in a random 
array will have a very small size. Of the 10000 trials there 
were six cases where |Gro| > 10. Finding a coherent sensor 
cluster with 10 or more vertices is therefore highly improbable 
to occur by chance. 

The mean vertex degrees of the 10000 localized coherence 
networks G(cq, dmax) were averaged and we found 7=0.2883. 
This value is compared in Figure with the theoretical 
probability distribution of the vertex degree for a random 
network G{N, a) of same size. Clearly, when no spatial 
localization is enforced the mean degree for the given array 
will be far above the threshold 7=1 where the network will 
connect. 


300 m. For each set of snapshots we therefore construct 
a network G(cQ=o.oijt^max=300 m) and find its connected 
components with more than ten vertices. Figure shows 
the array configuration with the locations of the sources 
and the sensor clusters found in one sequence of snapshots. 
Figure shows the connectivity matrix for the constructed 
network G(cQ=o.oijdmax=300 m). It illustrates the sparsity of 
the network. 

The performance of the approach is quantified as follows: 
for each simulation we count how many of the sources 
were not enclosed in the convex hull of a detected cluster 
(missed sources, i.e. false negatives) and at the same time 
how many of the detected clusters do not contain a source. 
Within 300 simulation runs (i.e. 900 sources to be detected) 
we found 27 missed sources (3.0%) and 37 spurious clusters 
(4.1%). A visual analysis of the simulation results (not shown) 
reveals that the spurious clusters where always associated 
with a nearby twin cluster that encompassed a source. These 
clusters therefore were probably caused by the presence of 
the source and do not correspond to the false positive rate 
established in section IIV-CI The observations indicate that a 
single source may occasionally give rise to more than one 
component around the source location. There were 922 clusters 
that were identified in the 300 simulation runs with a mean 
size of 24 vertices, indicating how small the relative size of 
the clusters is with respect to the array (2.3%). 

We define a boundary for the clusters using the l-cr proba¬ 
bility area of the PDF of group j (eq. [20)i: 


V. Verification on Simulated Data 


We again return to the simulation described in section III-B 


As discussed there, a sample coherence matrix is computed 
for every M snapshots and coherence magnitudes \Cij\ are 
found to hover around Cq for distances higher than about 


= {r I (r - - m*) < x?(0.69)} , (22) 

where xj the cumulative inverse -distribution, i.e. Ej 
contains 69% of the probability mass of the spatial PDF 
computed from the sensor locations of group j. 
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sensor-pair distance [m] 

Fig. 7. Fr acti on F of sensor pairs for which the null hypothesis is rejected 
at q;= 0.01 ^15| versus sensor separation dij = jr^ — rj|. The shaded ai'ea 
indicates the lU-90 percentiles of fractions during 12 hours from March lOth 
08.00-20.00h. Below about 300 m sensor separation there is an increasing 
fraction of pairs whose coherences cannot be explained by false rejections of 
the random wave field hypothesis. 

VI. Long Beach (CA) Geophone Array 

We demonstrate the technique on a geophone array that was 
deployed over an area of about 7 x 10 km in Long Beach 
(California, US) as part of an industrial seismic survey m- 
03- The array consisted of about 5200 geophones (OYO 
CT32D vertical velocity sensors with 10 Hz corner frequency) 
sampled at At=4 ms (see Figure]^). The sensor configuration 
is such that every sensor has on average one neighboring 
sensor within a perimeter of 90 m. We transform the velocity 
measurement data stream of each geophone into a sequence 
of Fourier coefficients Xk{f,t) using N = 256 samples 
(Tw=l-02 s) and a Hanning window wj 0 with 50% overlap. 
Linear trends in the segments are removed before computing 
Xk- For our analysis we compute the coherence matrix Cy on 
41 frequency bins from 9.8-48.8 Hz using M=19 snapshots 
(9.22 s). A matrix with about 5200^R:!27-10® entries is therfore 
computed for every frequency bin and time period. In a 
24 hour analysis period there are about 9400 time windows. 

As with the simulation in section |III-B| we first study 
the distance dependence of the fraction (|15|l, i.e. how much 
the null hypothesis rejection rate is above the background 
level a. For time segments consisting of M = 19 snapshots 
the sample coherence for all receiver pairs is computed at 
20 Hz following ( [T2) i. The receiver pairs are then grouped by 
geographic distance dij = jr^ — r^j into bins of width 25 m. 
The process is repeated for consecutive time segments from 
March 10th 08.00-20.00h (all time indications are local time). 
Figure shows the fraction of receiver pairs in each distance 
group for which \C\ij > Ca = 0.484 (Table |^, where we set 
a — 0.01. We find that given the coherence threshold Cq it 
makes sense to search for coherent receiver groups within a 
search radius of fimax=300 m to avoid all the false positives 
associated with larger distance sensor pairs. This observation 
was repeated for several other frequency bins in the 10-50 Hz 
band. We find in this dataset that the value of dmax depends 
only weakly on frequency but in general it will be problem 
dependent. 


Having established = 300 m we search for coherent 
receiver clusters in the data. We compute a coherence matrix 
Cij at 20.0 Hz from M = 19 consecutive snapshots which 
corresponds to an analysis window of 9.7 s. A localized array 
network G{ca, dmnx) is defined and all connected components 
with more than 10 vertices are identified. Figure shows 
the coherent groups found over four consecutive 9.7 s anal¬ 
ysis windows starting on March 11th, 10;48;48h. The period 
contains a 40 s stretch during which a seismic vibrotruck was 
operating in the Southeast of the array, which is confirmed by 
a cluster of coherent receivers in that area. 

Figure shows a sequence of coherent groups at 47 Hz 
for consecutive windows starting March 11th at 05.53h. They 
show a north-south motion over 6 km during the course of 
about 95 s. The corresponding velocity 60 m/s (134 mph) on 
the observed trajectory implies an aircraft moving across the 
array. Figure shows a spectrogram from a receiver within 
the trajectory of the moving source computed about 15 s after 
the coherence was observed. The observed Doppler shifts of 
/high//iow 1-4 ~ (1 -f ^)/(l - are consistent with 
the approximate velocity estimate. Finally, the narrow-band 
harmonics at multiples of 12 Hz suggest that the passage of a 
helicopter was captured. 

A. Statistics over frequency and time 

We perform the above analysis with the same parameteri¬ 
zation for 41 frequency bins from 9.8-48.8 Hz for 24 hours 
starting on March 10th. For every detected coherent sensor 
cluster we store the mean of the coordinates ( |20| , the area of 
the convex hull around the cluster, and the frequency and time 
at which the cluster is observed. More than 450,000 clusters 
with more than ten vertices were observed during the course of 
March 10th in the given frequency bins. Figure]^ shows the 
number of clusters found in each analysis hour and frequency 
bin and Figure gives the average area covered by those 
clusters. We find that in most frequency bands 90% of clusters 
cover less than 1.5% of the array aperture. 

The maps in Figure give a geographic overview of 
the identified cluster centers in the course of March 10th 
for frequency bands 9.8-19.5 Hz and 39.1^8.8 Hz, each 
containing 10 frequency bins. A wealth of information could 
be collected in the results. Three particular regions with more 
sources are identified. Our analysis considers every frequency 
bin separately. This can be readily extended to aggregate 
clusters in different frequency bands that occur at a similar 
location and time. 

VH. Conclusion 

We have introduced a source localization method for large 
aperture dense arrays that makes minimal assumptions about 
the propagation medium. The method requires that source 
signals exhibit significant strength only over a small distance 
within the array. The spatial covariance matrix of a wave field 
has been reinterpreted as a matrix whose support is a con¬ 
nectivity matrix of a network of vertices (sensors) connected 
into communities. These communities correspond to sensor 
clusters associated with individual sources. The support of the 







3745 

3744 

3743 

3742 

Z 3741 

I- 

D 

3740 

3739 

3738 

3737 

3736 



389 390 391 392 393 394 395 

UTME[km] 


Fig. 8. Connected components of the array network were used to find coherent sensor clusters in the Long Beach geophone array. The clusters from four 
9.7 s windows after 10.48h on March 11th are shown in (A). The Southern sensor group coincides with vibrotruck activity in the Southeastern part of the 
an'ay as part of the seismic acquisition operations. The dashed ellipses show the spatial extent F2 of the groups (22). (B) A helicopter fly-by is captured in a 
sequence of coherent groups at 47 Hz over consecutive analysis time periods (starting at 05.53h, each time segment is 9.7 s advanced). The colors change 
from red to blue as the analysis windows advance in time. The arrow points to the receiver from which the spectrogram (C) was computed about 15 s after 
the coherence at that location was observed. Narrow-band harmonics at multiples of 12 Hz imply a helicopter as the source and the observed Doppler shift 
of /high//low — 1-4 is roughly consistent with a velocity of about 60 m/s. 


covariance matrix has been estimated from limited-time data 
using a hypothesis test combined with a physical distance 
criterion. The latter was successful in preventing network 
communities from forming by chance 

The method was tested on simulated data from a dense 
array containing three concurrent sources. From a total of 
900 simulated sources only 3% were not detected. A real data 
example from a 5200 element seismic array demonstrates the 
high degree of detail that the method can reveal about man¬ 
made noise sources within the array. 

Appendix A 

Some remarks about matrix and vector support 
Define the support indicator function of a vector v G C^: 

' Tt ^ h - /1 if V* 7^ 0 

V = X{-v) where v, = (23) 

10 if Vi = 0 

and analogously an indicator function I(V) for a matrix V. 
We then have: 

I(vw^) = vw^ (24) 

X(av) = V (for a ^ 0) (25) 

I(V + W) = V + W - (V o W) , (26) 

where o is the Hadamard operator (element-wise multiplica¬ 
tion). Assume that the supports of v and w do not overlap, 
i.e. V o w = 0, the zero-vector. Therefore 

(vv^) o (ww^) = (v o w) (v o w)^ = 0 (27) 


is the zero matrix and consequently, from (261 and (271 
I(vv'^ + ww^) = vv^ + — (vv^ o ww^) 


(28) 


In other words, if the support of vectors v and w do not 
overlap, then the support of the sum of their outer products 
is simply the sum of the outer products of their support- 
indicators. 


Appendix B 

Connectivity Matrix operations 

Let G be a network with N vertices and let A, B C G be 
two subsets of vertices that do not overlap, AOB = 0. Define 
the indicator vector of a set of vertices A as: 

{ 1 if i G A 

, (29) 

0 \fi^ A 

Consider a network where A is the only connected com¬ 
ponent. In this case The connectivity matrix E for a network 
where A is the only connected component has a support that 
is equal or less than aa^^. 

Let E be the connectivity matrix of a network G. The 
following statements are equivalent: 

1) The network G has two (and only two) connected 
components A and B that are not connected with each 
other, Afi B = 0. 
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2) E has a support that is equal or smaller than aa^+bb^. 

Proof: Consider first the fully connected case, i.e. Oi = 
ttj = 1 Vi, j and likewise for b. If the first item is true, then 
GiOj = 0 if either i ^ A or j ^ A. Also, bibj = 0 if either 
i ^ B or j ^ B. Therefore we have Eij = UiOj + bibj > 0 
only if i,jGA or i,jGB and the second item is true. The 
other direction follows a similar logic to argument just given. 
Because the fully connected component contains all possible 
edges within a set of vertices any lesser connected components 
will have a subset of those edges, i.e. the connectivity matrix 
can only have fewer non-zero values. ■ 

Note that the union of the support of a connectivity matrix 
with the support of the unity matrix does not affect connected 
components that have a size larger than one. 
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Fig. 9. Over 450,000 coherent sensor clusters with more than ten vertices 
were observed during the course of March 10th between 9.8-48.8 Hz. The 
number of clusters found in each hour and frequency bin is shown in (A). 
The average spatial area covered by the convex hull of those clusters is shown 
in (B). A wide-band increase in cluster area is seen from 9-15h and around 
30^3 Hz in the early morning hours. 
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Fig. 10. Overview of all clusters identified during March 10th in the frequency range 9.8-19.5 Hz (A) and 39.1^8.8 Hz (B). Each dot represents the center of 
a sensor cluster. Of those clusters 90% cover less than 1.5% of the aiTay aperture ai'ea. Clusters in regions 1 and 2 coincide with oil production infrastmcture. 
Region 1 contains several pump jacks and drill rigs while region 2 contains the central pump facility. Note how the spatial distribution of detections differs 
for the two bands. Region 3 highlights a strip of Long Beach Blvd that coincides with a section of the Long Beach light rail metro seiwice. The strip is 
particularly active in low-frequencies. 
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